16 research outputs found

    A System for Identifying Named Entities in Biomedical Text: how Results From two Evaluations Reflect on Both the System and the Evaluations

    Get PDF
    We present a maximum entropy-based system for identifying named entities (NEs) in biomedical abstracts and present its performance in the only two biomedical named entity recognition (NER) comparative evaluations that have been held to date, namely BioCreative and Coling BioNLP. Our system obtained an exact match F-score of 83.2% in the BioCreative evaluation and 70.1% in the BioNLP evaluation. We discuss our system in detail, including its rich use of local features, attention to correct boundary identification, innovative use of external knowledge resources, including parsing and web searches, and rapid adaptation to new NE sets. We also discuss in depth problems with data annotation in the evaluations which caused the final performance to be lower than optimal

    An annotation scheme for information status in dialogue

    Get PDF
    We present an annotation scheme for information status (IS) in dialogue, and validate it on three Switchboard dialogues. We show that our scheme has good reproducibility, and compare it with previous attempts to code IS and related features. We eventually apply the scheme to 147 dialogues, thus producing a corpus that contains nearly 70,000 NPs annotated for IS and over 15,000 coreference links.

    Exploring the boundaries: gene and protein identification in biomedical text

    Get PDF
    Background: Good automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools. Methods: We present a maximum-entropy based system incorporating a diverse set of features for identifying gene and protein names in biomedical abstracts. Results: This system was entered in the BioCreative comparative evaluation and achieved a precision of 0.83 and recall of 0.84 in the “open ” evaluation and a precision of 0.78 and recall of 0.85 in the “closed ” evaluation. Conclusions: Central contributions are rich use of features derived from the training data at multiple levels of granularity, a focus on correctly identifying entity boundaries, and the innovative use of several external knowledge sources including full MEDLINE abstracts and web searches. Background The explosion of information in the biomedical domain and particularly in genetics has highlighted the need for automated text information extraction techniques. MEDLINE, the primary research database serving the biomedical community, currently contains over 14 million abstracts, with 60,000 new abstracts appearing each month. There is also an impressive number of molecular biological databases covering a

    Information Retrieval Systems Adapted to the Biomedical Domain

    Get PDF
    The terminology used in Biomedicine shows lexical peculiarities that have required the elaboration of terminological resources and information retrieval systems with specific functionalities. The main characteristics are the high rates of synonymy and homonymy, due to phenomena such as the proliferation of polysemic acronyms and their interaction with common language. Information retrieval systems in the biomedical domain use techniques oriented to the treatment of these lexical peculiarities. In this paper we review some of the techniques used in this domain, such as the application of Natural Language Processing (BioNLP), the incorporation of lexical-semantic resources, and the application of Named Entity Recognition (BioNER). Finally, we present the evaluation methods adopted to assess the suitability of these techniques for retrieving biomedical resources.Comment: 6 pages, 4 table

    1 The Effect of Feature Hierarchies on Frequencies of Passivization in English

    No full text
    It is my pleasure to thank Joan Bresnan, my principal advisor, for making this thesis seem manageable, and for pointing me in directions for research, and for consistently being enthusiastic and suggesting solutions to problem after problem, and equally, for suggesting problems and keeping me aware of the issues involved. And particularly for always making me feel more comfortable, in classes and meetings and a number of other situations, to express myself freely, and for handling it gracefully when I was incoherent. I was lucky to be able to work with her and it was a unique experience watching her work. Thanks to Chris Manning, my second advisor, for being helpful in a number of ways despite the incredible demands on his time this year, particularly in providing advice with statistics, and helping me to search corpora, and setting up software for searching them, and for reading what I wrote critically, and in general for being down-to-earth. Thanks to both Chris and Joan for giving me the opportunity to work on such an interesting project. I have really enjoyed our meetings and watching the ideas in the project take shape. Thanks especially for a wonderful trip to Hong Kong. And thanks to Tom Wasow for helping me get involved with the project and for teaching, with Ivan Sag, an introductory syntax class which I very much liked. A very special thanks to Ivan Sag, who is almost entirely responsible, along with Tom Wasow, fo

    Soft Constraints Mirror Hard Constraints: Voice and Person in English and Lummi

    No full text
    The same categorical phenomena which are attributed to hard grammatical constraints in some languages continue to show up as statistical preferences in other languages, motivating a grammatical model that can account for soft constraints. The effects of a hierarchy of person (1st, 2nd 3rd) on grammar are categorical in some languages, most famously in languages with inverse systems, but also in languages with person restrictions on passivization. In Lummi, for example, the person of the subject argument cannot be lower than the person of a nonsubject argument. If this would happen in the active, passivization is obligatory; if it would happen in the passive, the active is obligatory (Jelinek and Demers 1983). These facts follow from the theory of harmonic alignment in OT: constraints favoring the harmonic association of prominent person (1st, 2nd) with prominent syntactic function (subject) are hypothesized to be present as subhierarchies of the grammars of all languages, but to vary ..
    corecore